{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Output Fixing Parser\n",
    "\n",
    "- Author: [Jeongeun Lim](https://www.linkedin.com/in/jeongeun-lim-808978188/)\n",
    "- Peer Review : [Junseong Kim](https://www.linkedin.com/in/%EC%A4%80%EC%84%B1-%EA%B9%80-591b351b2/)\n",
    "- Proofread : [Two-Jay](https://github.com/Two-Jay)\n",
    "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/03-OutputParser/08-OutputFixingParser.ipynb)[![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/03-OutputParser/08-OutputFixingParser.ipynb)\n",
    "## Overview\n",
    "\n",
    "The ```OutputFixingParser``` in LangChain provides an automated mechanism for correcting errors that may occur during the output parsing process. This parser is designed to wrap around another parser, such as the ```PydanticOutputParser```, and intervenes when the underlying parser encounters outputs that are malformed or do not conform to the expected format. It achieves this by leveraging additional LLM calls to fix the errors and ensure proper formatting.\n",
    "\n",
    "At its core, the ```OutputFixingParser``` addresses situations where the initial output does not comply with a predefined schema. If such an issue arises, the parser automatically detects the formatting errors and submits a new request to the model, including specific instructions for correcting the issue. These instructions highlight the problem areas and provide clear guidelines for restructuring the data in the correct format.\n",
    "\n",
    "This functionality is particularly useful in scenarios where strict adherence to a schema is critical. For example, when using the ```PydanticOutputParser``` to generate outputs conforming to a particular data schema, issues such as missing fields or incorrect data types might occur. \n",
    "\n",
    "- The ```OutputFixingParser``` steps in as follows:\n",
    "\n",
    "1. **Error Detection** : It recognizes that the output does not meet the schema requirements.\n",
    "2. **Error Correction** : It generates a follow-up request to the LLM with explicit instructions to address the issues.\n",
    "3. **Reformatted Output with Specific Instructions** : The ```OutputFixingParser``` ensures that the correction instructions precisely identify the errors, such as missing fields or incorrect data types. The instructions guide the LLM to reformat the output to meet the schema requirements accurately.\n",
    "\n",
    "\n",
    "---- \n",
    "Practical Example:\n",
    "\n",
    "Suppose you are using the ```PydanticOutputParser``` to enforce a schema requiring specific fields like ```name``` (string), ```age``` (integer), and ```email``` (string). If the LLM produces an output where the ```age``` field is missing or the ```email``` field is not a valid string, the ```OutputFixingParser``` automatically intervenes. It would issue a new request to the LLM with detailed instructions such as:\n",
    "\n",
    "- \"The output is missing the ```age``` field. Add an appropriate integer value for ```age```.\"\n",
    "- \"The ```email``` field contains an invalid format. Correct it to match a valid email string.\"\n",
    "\n",
    "This iterative process ensures the final output conforms to the specified schema without requiring manual intervention.\n",
    "\n",
    "\n",
    "---- \n",
    "Key Benefits: \n",
    "\n",
    "- **Error Recovery**: Automatically handles malformed outputs without requiring user input.\n",
    "- **Enhanced Accuracy**: Ensures outputs conform to predefined schemas, reducing the risk of inconsistencies.\n",
    "- **Streamlined Workflow**: Minimizes the need for manual corrections, saving time and improving efficiency.\n",
    "\n",
    "\n",
    "---- \n",
    "Implementation Steps: \n",
    "\n",
    "To use the ```OutputFixingParser``` effectively, follow these steps:\n",
    "\n",
    "1. **Wrap a Parser**: Instantiate the ```OutputFixingParser``` with another parser, such as the ```PydanticOutputParser```, as its base.\n",
    "2. **Define the Schema**: Specify the schema or format that the output must adhere to.\n",
    "3. **Enable Error Correction**: Allow the ```OutputFixingParser``` to detect and correct errors automatically through additional LLM calls, ensuring that correction instructions precisely identify and address issues for accurate reformatting.\n",
    "\n",
    "By integrating the ```OutputFixingParser``` into your workflow, you can ensure robust error handling and maintain consistent output quality in your LangChain applications.\n",
    "\n",
    "\n",
    "### Table of Contents\n",
    "\n",
    "- [Overview](#overview)\n",
    "- [Environment Setup](#environment-setup)\n",
    "- [Define Data Model and Set Up PydanticOutputParser](#define-data-model-and-set-up-pydanticoutputparser)\n",
    "- [Using OutputFixingParser to Correct Incorrect Formatting](#using-outputfixingparser-to-correct-incorrect-formatting)\n",
    "\n",
    "### References\n",
    "\n",
    "- [LangChain API Reference](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.fix.OutputFixingParser.html)\n",
    "- [Pydantic Docs](https://docs.pydantic.dev/latest/api/base_model/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Environment Setup\n",
    "\n",
    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
    "\n",
    "**[Note]**\n",
    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
    "- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
    "%pip install langchain-opentutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages\n",
    "from langchain_opentutorial import package\n",
    "\n",
    "package.install(\n",
    "    [\n",
    "        \"langsmith\",\n",
    "        \"langchain\",\n",
    "        \"langchain_openai\",\n",
    "        \"langchain_community\",\n",
    "    ],\n",
    "    verbose=False,\n",
    "    upgrade=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Environment variables have been set successfully.\n"
     ]
    }
   ],
   "source": [
    "# Set environment variables\n",
    "from langchain_opentutorial import set_env\n",
    "\n",
    "set_env(\n",
    "    {\n",
    "        \"OPENAI_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_TRACING_V2\": \"true\",\n",
    "        \"LANGCHAIN_ENDPOINT\": \"https://api.smith.langchain.com\",\n",
    "        \"LANGCHAIN_PROJECT\": \"08-OutputFixingParser\",\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can alternatively set ```OPENAI_API_KEY``` in ```.env``` file and load it. \n",
    "\n",
    "[Note] This is not necessary if you've already set ```OPENAI_API_KEY``` in previous steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv(override=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Data Model and Set Up PydanticOutputParser\n",
    "\n",
    "- The Actor class is defined using the Pydantic model, where name and film_names are fields representing the actor's name and a list of films they starred in.\n",
    "- The ```PydanticOutputParser``` is used to parse outputs into an Actor object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.output_parsers import PydanticOutputParser\n",
    "from pydantic import BaseModel, Field\n",
    "from typing import List\n",
    "\n",
    "\n",
    "# Define the Actor class using Pydantic\n",
    "class Actor(BaseModel):\n",
    "    name: str = Field(description=\"name of an actor\")\n",
    "    film_names: List[str] = Field(description=\"list of names of films they starred in\")\n",
    "\n",
    "\n",
    "# A query to generate the filmography for a random actor\n",
    "actor_query = \"Generate the filmography for a random actor.\"\n",
    "\n",
    "# Use PydanticOutputParser to parse the output into an Actor object\n",
    "parser = PydanticOutputParser(pydantic_object=Actor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Attempt to Parse Misformatted Input Data\n",
    "\n",
    "- The misformatted variable contains an incorrectly formatted string, which does not match the expected structure (using ' instead of \").\n",
    "- Calling parser.parse() will result in an error because of the format mismatch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "ename": "OutputParserException",
     "evalue": "Invalid json output: {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}\nFor troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE ",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mJSONDecodeError\u001b[0m                           Traceback (most recent call last)",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\output_parsers\\json.py:83\u001b[0m, in \u001b[0;36mJsonOutputParser.parse_result\u001b[1;34m(self, result, partial)\u001b[0m\n\u001b[0;32m     82\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 83\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mparse_json_markdown\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtext\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     84\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m JSONDecodeError \u001b[38;5;28;01mas\u001b[39;00m e:\n",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\utils\\json.py:144\u001b[0m, in \u001b[0;36mparse_json_markdown\u001b[1;34m(json_string, parser)\u001b[0m\n\u001b[0;32m    143\u001b[0m     json_str \u001b[38;5;241m=\u001b[39m json_string \u001b[38;5;28;01mif\u001b[39;00m match \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;28;01melse\u001b[39;00m match\u001b[38;5;241m.\u001b[39mgroup(\u001b[38;5;241m2\u001b[39m)\n\u001b[1;32m--> 144\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_parse_json\u001b[49m\u001b[43m(\u001b[49m\u001b[43mjson_str\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparser\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mparser\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\utils\\json.py:160\u001b[0m, in \u001b[0;36m_parse_json\u001b[1;34m(json_str, parser)\u001b[0m\n\u001b[0;32m    159\u001b[0m \u001b[38;5;66;03m# Parse the JSON string into a Python dictionary\u001b[39;00m\n\u001b[1;32m--> 160\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mparser\u001b[49m\u001b[43m(\u001b[49m\u001b[43mjson_str\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\utils\\json.py:118\u001b[0m, in \u001b[0;36mparse_partial_json\u001b[1;34m(s, strict)\u001b[0m\n\u001b[0;32m    115\u001b[0m \u001b[38;5;66;03m# If we got here, we ran out of characters to remove\u001b[39;00m\n\u001b[0;32m    116\u001b[0m \u001b[38;5;66;03m# and still couldn't parse the string as JSON, so return the parse error\u001b[39;00m\n\u001b[0;32m    117\u001b[0m \u001b[38;5;66;03m# for the original string.\u001b[39;00m\n\u001b[1;32m--> 118\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mjson\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mloads\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstrict\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstrict\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\json\\__init__.py:359\u001b[0m, in \u001b[0;36mloads\u001b[1;34m(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)\u001b[0m\n\u001b[0;32m    358\u001b[0m     kw[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mparse_constant\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m parse_constant\n\u001b[1;32m--> 359\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mcls\u001b[39;49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkw\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdecode\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\json\\decoder.py:337\u001b[0m, in \u001b[0;36mJSONDecoder.decode\u001b[1;34m(self, s, _w)\u001b[0m\n\u001b[0;32m    333\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Return the Python representation of ``s`` (a ``str`` instance\u001b[39;00m\n\u001b[0;32m    334\u001b[0m \u001b[38;5;124;03mcontaining a JSON document).\u001b[39;00m\n\u001b[0;32m    335\u001b[0m \n\u001b[0;32m    336\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m--> 337\u001b[0m obj, end \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mraw_decode\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43midx\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_w\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mend\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    338\u001b[0m end \u001b[38;5;241m=\u001b[39m _w(s, end)\u001b[38;5;241m.\u001b[39mend()\n",
      "File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\json\\decoder.py:353\u001b[0m, in \u001b[0;36mJSONDecoder.raw_decode\u001b[1;34m(self, s, idx)\u001b[0m\n\u001b[0;32m    352\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m--> 353\u001b[0m     obj, end \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mscan_once\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43midx\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    354\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mStopIteration\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n",
      "\u001b[1;31mJSONDecodeError\u001b[0m: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)",
      "\nThe above exception was the direct cause of the following exception:\n",
      "\u001b[1;31mOutputParserException\u001b[0m                     Traceback (most recent call last)",
      "Cell \u001b[1;32mIn[6], line 3\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[38;5;66;03m# Intentionally input misformatted data\u001b[39;00m\n\u001b[0;32m      2\u001b[0m misformatted \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m{\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mname\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m: \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mTom Hanks\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mfilm_names\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m: [\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mForrest Gump\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m]}\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m----> 3\u001b[0m \u001b[43mparser\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparse\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmisformatted\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m      5\u001b[0m \u001b[38;5;66;03m# An error will be printed because the data is incorrectly formatted\u001b[39;00m\n",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\output_parsers\\pydantic.py:83\u001b[0m, in \u001b[0;36mPydanticOutputParser.parse\u001b[1;34m(self, text)\u001b[0m\n\u001b[0;32m     74\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mparse\u001b[39m(\u001b[38;5;28mself\u001b[39m, text: \u001b[38;5;28mstr\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m TBaseModel:\n\u001b[0;32m     75\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"Parse the output of an LLM call to a pydantic object.\u001b[39;00m\n\u001b[0;32m     76\u001b[0m \n\u001b[0;32m     77\u001b[0m \u001b[38;5;124;03m    Args:\u001b[39;00m\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m     81\u001b[0m \u001b[38;5;124;03m        The parsed pydantic object.\u001b[39;00m\n\u001b[0;32m     82\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[1;32m---> 83\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparse\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtext\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\output_parsers\\json.py:97\u001b[0m, in \u001b[0;36mJsonOutputParser.parse\u001b[1;34m(self, text)\u001b[0m\n\u001b[0;32m     88\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mparse\u001b[39m(\u001b[38;5;28mself\u001b[39m, text: \u001b[38;5;28mstr\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Any:\n\u001b[0;32m     89\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"Parse the output of an LLM call to a JSON object.\u001b[39;00m\n\u001b[0;32m     90\u001b[0m \n\u001b[0;32m     91\u001b[0m \u001b[38;5;124;03m    Args:\u001b[39;00m\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m     95\u001b[0m \u001b[38;5;124;03m        The parsed JSON object.\u001b[39;00m\n\u001b[0;32m     96\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[1;32m---> 97\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparse_result\u001b[49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[43mGeneration\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtext\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtext\u001b[49m\u001b[43m)\u001b[49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\output_parsers\\pydantic.py:72\u001b[0m, in \u001b[0;36mPydanticOutputParser.parse_result\u001b[1;34m(self, result, partial)\u001b[0m\n\u001b[0;32m     70\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m partial:\n\u001b[0;32m     71\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m---> 72\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\output_parsers\\pydantic.py:67\u001b[0m, in \u001b[0;36mPydanticOutputParser.parse_result\u001b[1;34m(self, result, partial)\u001b[0m\n\u001b[0;32m     54\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Parse the result of an LLM call to a pydantic object.\u001b[39;00m\n\u001b[0;32m     55\u001b[0m \n\u001b[0;32m     56\u001b[0m \u001b[38;5;124;03mArgs:\u001b[39;00m\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m     64\u001b[0m \u001b[38;5;124;03m    The parsed pydantic object.\u001b[39;00m\n\u001b[0;32m     65\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m     66\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 67\u001b[0m     json_object \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparse_result\u001b[49m\u001b[43m(\u001b[49m\u001b[43mresult\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     68\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_parse_obj(json_object)\n\u001b[0;32m     69\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m OutputParserException \u001b[38;5;28;01mas\u001b[39;00m e:\n",
      "File \u001b[1;32mc:\\Users\\Machine_K\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\langchain-opentutorial-fOxWcZdD-py3.11\\Lib\\site-packages\\langchain_core\\output_parsers\\json.py:86\u001b[0m, in \u001b[0;36mJsonOutputParser.parse_result\u001b[1;34m(self, result, partial)\u001b[0m\n\u001b[0;32m     84\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m JSONDecodeError \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[0;32m     85\u001b[0m     msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInvalid json output: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mtext\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m---> 86\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m OutputParserException(msg, llm_output\u001b[38;5;241m=\u001b[39mtext) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n",
      "\u001b[1;31mOutputParserException\u001b[0m: Invalid json output: {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}\nFor troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE "
     ]
    }
   ],
   "source": [
    "# Intentionally input misformatted data\n",
    "misformatted = \"{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}\"\n",
    "parser.parse(misformatted)\n",
    "\n",
    "# An error will be printed because the data is incorrectly formatted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using OutputFixingParser to Correct Incorrect Formatting\n",
    "### Set Up OutputFixingParser to Automatically Correct the Error\n",
    "- ```OutputFixingParser``` wraps around the existing ```PydanticOutputParser``` and automatically fixes errors by making additional calls to the LLM.\n",
    "- The from_llm() method connects ```OutputFixingParser``` with ```ChatOpenAI``` to correct the formatting issues in the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain.output_parsers import OutputFixingParser\n",
    "\n",
    "# Define a custom prompt to provide the fixing instructions\n",
    "fixing_prompt = PromptTemplate(\n",
    "    template=(\n",
    "        \"The following JSON is incorrectly formatted or incomplete: {completion}\\n\"\n",
    "    ),\n",
    "    input_variables=[\n",
    "        \"completion\",\n",
    "    ],\n",
    ")\n",
    "\n",
    "# Use OutputFixingParser to automatically fix the error\n",
    "new_parser = OutputFixingParser.from_llm(\n",
    "    parser=parser, llm=ChatOpenAI(model=\"gpt-4o\"), prompt=fixing_prompt\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}\""
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Misformatted Output Data\n",
    "misformatted"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parse the Misformatted Output Using OutputFixingParser\n",
    "- The new_parser.parse() method is used to parse the misformatted data. OutputFixingParser will correct the errors in the data and generate a valid Actor object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "actor = new_parser.parse(misformatted)\n",
    "\n",
    "# Attempt to parse the misformatted JSON with Exception Handling\n",
    "# try:\n",
    "#     actor = new_parser.parse(misformatted)\n",
    "#     print(\"Parsed actor:\", actor)\n",
    "# except Exception as e:\n",
    "#     print(\"Error while parsing:\", e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check the Parsed Result\n",
    "- After parsing, the result is a valid Actor object with the corrected format. The errors in the initial misformatted string have been automatically fixed by OutputFixingParser."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Actor(name='Tom Hanks', film_names=['Forrest Gump'])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "actor"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain-opentutorial-fOxWcZdD-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
